Towards A Framework For Learning Structured Shape Models From Text-Annotated Images

نویسندگان

Sven Wachsmuth

Suzanne Stevenson

Sven Dickinson

چکیده

We present on-goingwork on the topic of learning translation models between image data and text (English) captions. Most approaches to this problem assume a one-to-one or a flat, oneto-many mapping between a segmented image region and a word. However, this assumption is very restrictive from the computer vision standpoint, and fails to account for two important properties of image segmentation: 1) objects often consist of multiple parts, each captured by an individual region; and 2) individual regions are often over-segmented into multiple subregions. Moreover, this assumption also fails to capture the structural relations among words, e.g., part/whole relations. We outline a general framework that accommodates a many-to-many mapping between image regions and words, allowing for structured descriptions on both sides. In this paper, we describe our extensions to the probabilistic translation model of Brown et al. (1993) (as in Duygulu et al. (2002)) that enable the creation of structured models of image objects. We demonstrate our work in progress, in which a set of annotated images is used to derive a set of labeled, structured descriptions in the presence of oversegmentation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains

In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...

متن کامل

Improvement of generative adversarial networks for automatic text-to-image generation

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...

متن کامل

Towards Free-text Semantic Parsing: A Unified Framework Based on FrameNet, VerbNet and PropBank

This article describes a robust semantic parser that uses a broad knowledge base created by interconnecting three major resources: FrameNet, VerbNet and PropBank. The FrameNet corpus contains the examples annotated with semantic roles whereas the VerbNet lexicon provides the knowledge about the syntactic behavior of the verbs. We connect VerbNet and FrameNet by mapping the FrameNet frames to th...

متن کامل

Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature

Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the class...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Towards A Framework For Learning Structured Shape Models From Text-Annotated Images

نویسندگان

چکیده

منابع مشابه

Sparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains

Improvement of generative adversarial networks for automatic text-to-image generation

Towards Free-text Semantic Parsing: A Unified Framework Based on FrameNet, VerbNet and PropBank

Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature

Learning Document Image Features With SqueezeNet Convolutional Neural Network

عنوان ژورنال:

اشتراک گذاری